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Abstract 

Slow feature analysis (SFA) is a method for extracting slowly varying features from a quickly varying 
multidimensional signal. An open source MATLAB-implementation sfa-tk makes SFA easily useable. 
We show here that under certain circumstances, namely when the covariance matrix of the nonlinearly 
expanded data does not have full rank, this implementation runs into numerical instabilities. We propse 
a modified algorithm based on singular value decomposition (SVD) which is free of those instabilities 
even in the case where the rank of the matrix is only less than 10% of its size. Furthermore we show 
that an alternative way of handling the numerical problems is to inject a small amount of noise into the 
multidimensional input signal which can restore a rank-deficient covariance matrix to full rank, however 
at the price of modifying the original data and the need for noise parameter tuning. 

1 Introduction 

Slow feature analysis (SFA) is an information processing method proposed by Wiskott and Sejnowski (WS02) 
which allows to extract slowly varying signals from complex multidimensional time series. Wiskott (Wis98) 
formulated a similar idea already before as a model of unsupervised learning of invariances in the visual 
system of vertebrates. SFA has been applied successfully to numerous different tasks: to reproduce a wide 
range of properties of complex cells in primary visual cortex (BW05) , to model the self-organized formation 
of place cells in the hippocampus (FSW07), to classify handwritten digits (Ber05) and to extract driving 
forces from nonstationary time series (Wis03). 

The analysis of nonstationary time series plays an important role in the data understanding of various 
phenomena such as temperature drift in experimental setup, global warming in climate data or varying heart 
rate in cardiology. Such nonstationarities can be modeled by underlying parameters, referred to as driving 
forces, that change the dynamics of the system smoothly on a slow time scale or abruptly but rarely, e.g. if the 
dynamics switches between different discrete states. (Wis03). Often, e.g. in EEG-analysis or in monitoring 
of complex chemical or electrical power plants, one is particularly interested in revealing the driving forces 
themselves from the raw observed time series since they show interesting aspects of the underlying dynamics. 

We recently studied the notion of slowness in the context of driving force extraction from nonstationary 
time series (Kon09). There we used sfa-tk (Bcr03), a freely available MATLAB-implementation of SFA, 
to perform our experiments. These experiments requested to perform driving force extraction over a wide 
range of parameters. To our surprise in some seemingly very 'simple' time series which had a high regularity 
the original sfa-tk- algorithm failed and produced 'slow' signals violating certain constraints as well as the 
slowness condition. We analyzed the problem and finally traced its source back to a rank deficiency in the 
expanded data's covariance matrix. In this paper we present this analysis and develop a new implementation 
which solves the problem. 
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This paper is organized as follows: In Sec. 2 we review the properties of the SFA approach and present 
the orignal sf a-tk-algorithm as well as a slightly modified version which we will call SVD_SFA. In Sec. 3 
we show some driving force experiments demonstrating cases where the new SVD_SFA algorithm gives good 
results while the original sf a-tk-algorithm fails. Sec. 4 discusses the properties of SVDJ3FA as well as an 
alternative approach based on noise injection. 



2 Slow Feature Analysis 

Slow feature analysis (SFA) has been originally developed in context of an abstract model of unsupervised 
learning of invariances in the visual system of vertebrates (Wis98) and is described in detail in (WS02; Wis03). 



2.1 The SFA approach 

We briefly review here the approach described in (Wis03). The general objective of SFA is to extract slowly 
varying features from a quickly varying multidimensional signal. 

Given is at time t a raw input signal s = s(t) as an m-dimensional vector s = [si, ...%]. A preprocessing 
step performs a sphering (see Appendix A in Sec. 6) and an optional dimension reduction 

x(t) = W (s(t)-E[s]) (1) 

with E [•] indicating the temporal mean and with Wo as the (n x m) sphering matrix, n < m, whose n rows 
are proportional to the n largest eigenvectors of Cov(s) (PCA). If n = rn, no dimension reduction takes place. 
The preprocessed input x(t) S 5ft™ is in any case mean- free and has a unit covariance matrix E [:ra; T ] = 1. 
(A simplified preprocessing which just normalizes each input component to mean and variance 1 is also 
possible, but does not allow for dimension reduction.) 

For the preprocessed input x(t) the SFA approach can be formalized as follows: Find the input-output 
function g{x) that generates a scalar output signal 

y{t) := g{x{t)) (2) 

with its temporal variation as slowly as possible, measured by the variance of the time derivative: 

minimize A(y) = E [y 2 ] (3) 

To avoid the trivial constant solution the output signal has to meet the following constraints: 

E [v] — (zero mean) , (4) 
E [y 2 ~\ — 1 (unit variance) . (5) 

This is an optimization problem of variational calculus and as such difficult to solve. But if we constrain 
the input-output function to be a linear combination of some fixed and possibly nonlinear basis functions, 
the problem becomes tractable. 

Let v = v(x) £ !ft M be a vector of some fixed basis functions. To be concrete assume that v contains 
all monomials of degree 1 and 2. Applying v to the input signal x(i) yields the nonlinearly expanded signal 

«(*): 

v(x) := [xi, x 2 , x n , x\, xix 2 , xl] T 1 (6) 
v(t) := v(x(t)). (7) 

For reasons that become clear below, it is convenient to sphere (or whiten) the expanded signal and 
transform the basis functions accordingly: 

z(x) := 3 (v(x)-E[v]), (8) 
z{t) := S (v(t)-E[v]), (9) 
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with the sphering matrix S (see Appendix A inSec. 6) chosen such that the signal components have a unit 
covariance matrix, i.e. 

1 = Cov(z) = E [zz T ] = SCov(v) S T , (10) 

which can be done easily with the help of singular value decomposition (SVD). However, such a sphering 
matrix S exists only, if B := Cov(u) has no eigenvalues exactly zero (or close to zero). The signal components 
have also zero mean, E [z] = 0, since the mean values have been subtracted. 

Proposition 1 Let C := E [zz T ~\ be the time-derivative covariance matrix of the sphered signals. Calcu- 
late with SVD the eigenvectors Wj of C with their eigenvalues Ai < A2 < A 3 < i.e. 

Cwj = XjWj (11) 

(we assume here that the eigenvalues are pairwise distinct and that the eigenvectors have norm 1, i.e. 
wjwk = 5jk). Then the output signals defined as 



Vj (t) = wfz(t) (12) 



have the desired properties: 



E [ijj] = (zero mean) , 
E [uj] = 1 (unit variance) , 
E[yjyk] = (decorrelation) , j ^ k , (13) 
E[y/} = A,. 

Proof The first property is obvious since E [z] = 0, the second and third property are direct consequences 
of the sphered signal and the orthonormality of the eigenvectors: 

E [ViVk] = E [wfz(t)z(t) T w k ] =wjE [zz T ] w k = 5 jk 

=1 

Finally the fourth property comes from the eigenvalue equation: 

E [yj 2 ] = E [wjz{t)z{t) T w j ] = wfE [zz t ] w = wfCwj = wjXjWj = \ .4 

Thus the sequence yi(t) , 1/2(^)1 ••• constitutes a series of slowest, next-slowest, next-next-slowest, ... sig- 
nals, where each signal is completely decorrelated to all preceeding signals and each signal is the slowest 
signal among those decorrelated to the preceeding ones. 

Proposition 2 If B := Cov(v) is regular (i.e. it has no zero eigenvalues and thus S and S^ 1 exist) then 
(and only then) Eq. (11) is equivalent to the generalized eigenvalue equation of Berkes (Ber03): 

C'w'j = XjBw'j with C := E [vv T ] (14) 

and both eigenvalue equations have the same eigenvalues Xj. 

Proof With z — Sv we have 

C = E [zz T ] = SE [vv T ] S T = SC'S T (15) 

Using this and the identity Eq. (27) from the Appendix A in Sec. 6 we can rewrite Eq. (11) 

Cwj = XjWj 
SC'S T Wj = X j SBS T w J 

—w'. 

3 

C w'j = Xj B vo'j 
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where we have multiplied with S~ x from the left in the third line. A 

Basically, SFA consists of the following four steps: (i) expand the input signal with some set of fixed 
possibly nonlinear functions; (ii) sphere the expanded signal to obtain components with zero mean and 
unit covariance matrix; (iii) compute the time derivative of the sphered expanded signal and determine 
the normalized eigenvectors of its covariance matrix; (iv) project the sphered expanded signal onto this 
eigenvectors to obtain the output signals yj. In the following Sections 2.2 and 2.3 we present two different 
ways of implementing the SFA approach. 



2.2 Algorithm GEINLEIG 

Algorithm GENJ5IG is based on Proposition 2 and is the SFA algorithm as implemented in sf a-tk (Bcr03) 
and as described in (Bcr05). Given is at time t a raw input signal s(t) € 3? m . Let n be the dimension of 
the preprocessed input x(t) and M be the dimension of the expanded signal v(t) (e.g. with monomials of 
degree 2). 



Unsupervised training on training signal s(t) 

l: x(t) := Wo (s(t) — s ) S Sft n , s := E [s] > Preprocess the raw input signal (sphering, PCA), Eq. (1) 

2: v(t) := v(x(t)) € 5R M , Vq := E [v] > Expand the preprocessed signal, see for example Eq. (6) 

3: B :— Cov(v) > Covariance matrix of expanded data, Eq. (22) 

4: C :— E [i) , w T ] > Time-derivative (v) covariance matrix, Eq. (14) 

5: Solve C w'j = XjBw'j for {Xj, w'j} > Generalized eigenvalue equation, Eq. (14) 
6: Return {Wo, s , v , Xj,w'j \ j = 1, . . . , M} 

The results returned from the training step can be applied to any input signal s(t) in the following way to 
yield the slow signals yj(t): 

Application to (training or new) signal s(t) 
l: x(t) = Wo (s(<) — s ) G R n > Preprocess the raw input signal (sphering, PCA), Eq. (1) 

2: v(t) — v(x(t)) € 3f M > Expand the preprocessed signal, see for example Eq. (6) 

3: Vj(t) — w 'j F ( v (t) ~ v o) 3ft, j = 1, . . . , M > yi(t): slowest, y2(t)- 2nd slowest signal and so on 

4: Return { Vj {t) \ j = 1, . . . , M} 

Algorithm GENJ5IG works fine as long as B does not contain zero (or close to zero) eigenvalues. But if 
B becomes singular (or close to singular) then Proposition 2 does no longer hold and we will see in Sec. 3 
that the slow signals yj may become wrong. Therefore we reformulated the SFA approach in such a fashion 
that we can control (sort out) zero (or very small) eigenvalues and present the result as algorithm SVD_SFA. 



2.3 Algorithm SVD SFA 

Algorithm SVD_SFA is based on (WS02) and Proposition 1. It is implemented as an extension of sfa-tk. 
The application part of SVD_SFA is the same as in algorithm GEN_EIG. The unsupervised training part is 
the same for Steps 1.-4., but somewhat different from Step 5. on: 



Unsupervised training on training signal s(t) 

x(t) :— Wo (s(t) — s ) € s := E [s] > Preprocess the raw input signal (sphering, PCA), Eq. (1) 

v(t) :— v(x(t)) € 5i M , v := E [v] > Expand the preprocessed signal, see for example Eq. (6) 



B := Cov('u) > Covariance matrix of expanded data, Eq 

C := E [ii) T ] > Time-derivative (v) covariance matrix, Eq 

Find S such that SBS T = 1 > Sphere expanded v with SVD, see Eq. (24), Eq 

C := SC'S T > Time-derivative (z) covariance matrix, Eq 

Solve C Wj = Xj Wj for {Xj, Wj} > Eigenvalue equation, Eq 



Return {W , s Q ,v , X^w'^ \ j = 1, . . . , P} 



(22) 
(14) 
(29) 
(15) 
(11) 
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Figure 1: Each diagram shows the SFA-output yi(t) (blue dots) and the aligned true driving force A(^(t); y(t)) 
(red line) (see Eq. (19)). The SFA-output was generated from a logistic map time series w(t) (Eq. (17)) 
with parameters q — 1.2, r = 1 and the driving force of Eq. (16). The upper row shows the GEN_EIG 
implementation and the lower row the SVD_SFA implementation. Left: m — 4, right: m = 8. We see 
that already at embedding dimension m — 8 the GEN_EIG results starts to deteriorate, since it has only an 
amplitude in the order 0/ICP 7 . 



The main difference is Step 5. where the sphering matrix S for B :— Cav(v) is calculated with SVD 
according to the lines described in Appendix A (Sec. 6). This works even in the case that B is singular. In 
that case some M — P rows of S are zero (more precisely: each row whose eigenvalue fulfills the 'close-to-zero' 
condition of Eq. (28)). These rows are either directly excluded from S, making it a (P x M) matrix, or they 
are kept and lead to M — P eigenvalues Xj of C which are exactly zero and which are then excluded from 
further analysis. In the case where B is regular, we have P — M and no eigenvalue is zero. In any case, 
algorithm SVD_SFA will return P non-zero eigenvalues, P < M, and their corresponding eigenvectors. 

In Step 6. of the unsupervised training we calculate C = E [zz T ] without the need for calculating z 
explicitly. Likewise, Step 8. prepares w'j in such a way that 

wf (v(t) - v ) = wj S (v(t) - v ) = wjz(t) 

so that Step 3. of the application part leads to the output signal as requested by Proposition 1. 

Why does the sphered expanded signal z(t), which is central to Proposition 1, not appear in algorithm 
SVD_SFA? The algorithm is deliberately implemented in such a way that direct access to z(t) is not necessary. 
The reason is that for large input sequences the implementation of sf a-tk allows to pass over the input data 
chunk- by- chunk into the expansion steps (2.-4.), where B and C" are formed, which is considerably less 
memory-consuming. But only when B is established at the end of the expansion phase, we can calculate S. 
If a direct representation of z(t) — S (y(t) — Vq) were necessary, another chunkwise pass-over of the input 
would be necessary and considerably slow down the computation. With the indirect method in Steps 6. and 
8. we accomplish the same result without the need for z(t). 

In the rest of this paper we will work with time series x(t{) instead of continuous signals x(t), but the 
transfer of the algorithms described above to time series is straight forward. The time derivative is simply 
computed as the difference between successive data points assuming a constant sampling spacing At = 1 . 



3 Experiments 

In the following we present examples with time series w(t) derived from a logistic map to illustrate the 
results with different implementations of SFA. The underlying driving force is denoted by ^(t) and may 
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Figure 2: Same as Fig. 1, only for larger to — 12 and m — 18. Now the results from GEN_EIG are 
completely deteriorated, they are by no means slow nor do they have unit variance, while SVDSFA still 
gives good results. 



vary between —1 and 1 smoothly and considerably slower (as defined by the variance of its time derivative 
(3)) than the time series w(t). The approach follows closely the work of Wiskott (Wis03) and own previous 
work (Kon09), using a very simple driving force in the shape of just one sinusoidal frequency component: 

= sin(0.0125i) G [-1,1] (16) 

The logistic map is formulated with a parameter q £ [0.1, 3.9] allowing to control the presence or absence of 
chaotic motion (see (Kon09) for details): 

w(t + 1) = (4.0 -q + 0.1j(t))w(t)(l - w(t)) , (17) 

Taking the time series w(t) directly as an input signal would not give SFA enough information to estimate 
the driving force, because SFA considers only data (and its derivative) from one time point at a time. Thus 
it is necessary to generate an embedding- vector time series as an input. Here embedding vectors at time 
point t with delay r and dimension to are defined by 

x (t) := [w(t-T(m-l)/2), u>(t - <r((m - l)/2 - 1)), w(t + t(to - 1)/2)] T , (18) 

for scalar w(t) and odd to. The definition can be easily extended to even to, which requires an extra shift 
of the indices by r/2 or its next lower integer to center the used data points at t. Centering the embedding 
vectors results in an optimal temporal alignment between estimated and true driving force. 

In order to inspect visually the agreement between a slow SFA-signal and the driving force 7(f) we must 
bring the driving force into alignment with y(t) (the scale and offset of slow signals y(t) extracted by SFA 
are arbitrarily fixed by the constraints and the sign is random). Therefore we define an y -aligned signal 

A( 1 (t);y(t)) = a 1 (t) + b (19) 

where the constants a and b are chosen in such a way that 

E [(a7(t) + b - y(t)) 2 ] = Min. (20) 

The following simulations are based on 6000 data points each and were done with Matlab 7.0.1 (Release 
14) as well as with Matlab 7.6.0 (R2008a) using the SFA toolkit sf a-tk (Bcr03) and our extensions to it. 

Fig. 1 and Fig. 2 show the results for some embedding dimensions in the case of the logistic map with 
q = 1.2. The GEN_EIG results in Fig. 2 are completely corrupted by numerical errors and do not show 
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Table 1: Results from the implementation experiments where index G denotes GEN_EIG and index S denotes 
SVDSFA. m: embedding dimension, N G : number of eigenvalues of the generalized eigenvalue equation, N$: 
number of eigenvalues of Cov(z), the constraints E[yi] and E \yf] ace. to Eq. (4), Eq. (5) and the slowness 
indicator r\ (WS02; Kon09). Low Tj-values indicate slow signals, high rj-values fast signals. For algorithm 
GEN_EIG and embedding dimension m > 8 neither the unit variance constraint E [j/J] G = 1 nor the slowness 
principle for r\Q are fulfilled. 
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anything 'slow'. Table 3 shows indicative numbers for the same experiment. The number of eigenvalues N G 
is always equal to the embedding dimension M which is for m > 8 much larger than the true dimensionality 
of the expanded data. In contrast, the number Ns of (non-zero) eigenvalues returned from SVD_SFA 
approaches a limiting value, in this case 26, as m increases. The mean value E [yi] is always correct since 
yi is built from mean-free components (cf. Eq. (12)). The unit-variance constraint E [yf\ G = 1 is violated 
for m > 8 in the case of GENJ5IG. Likewise, the slowness indicator n G becomes orders of magnitude larger 
than 11.8 which is the slowness of the true driving force (i.e. the signal is very fast). 




GEN_EIG 

• SVD_SFA 



5 10 15 20 25 30 35 40 45 

index 

Figure 3: Eigenvalues Xj for both algorithms GEN_EIG (red line) and SVD_SFA (blue dots) for the settings 
of Fig. 1 and embedding dimension m — 8, where B is singular. The red line is broken whenever a negative 
or complex eigenvalue occurs for GEN_EIG. 
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Table 2: Rank of matrix B — Cov(v) as a function of embedding dimension m and noise injection amplitude 
o~ v . M is the size of the expanded signal v which equals the number of rows and columns of B. For each 
noise amplitude lower than a v = 10~ 4 certain rank deficiencies occur. 










io- iu 


io~ 8 io- 6 


10" 4 




m 




rank(B) 


M 


2 




5 


5 


5 


5 


5 


5 


4 




14 


14 


14 


14 


14 


14 


8 




24 


28 


31 


43 


44 


44 


10 




30 


39 


48 


64 


65 


65 


12 




32 


54 


69 


88 


90 


90 


20 




35 


141 


189 


227 


230 


230 


30 




35 


236 


406 


490 


495 


495 



4 Discussion 

We can summarize the above experiments as follows: the standard implementation GEN_EIG of SFA in 
sf a-tk is likely to fail if B, the covariance matrix of the expanded data v, becomes singular. The fact 
that B becomes singular indicates that the dimension of the expanded space is higher than the 'true' 
dimensionality of the data. The failure can be traced back to Matlab's routine eig(A,B) for calculating 
generalized eigenvalues, which is said according to Matlab's Help to work also in the case of singular B 
but apparently is not. 

We show in Fig. 3 the eigenvalues Xj for both algorithms GEN_EIG and SVD_SFA for one example with 
singular B (embedding dimension m — 8). It is seen that in the case of GENJ5IG the line of eigenvalues 
is interrupted several times which is due to the fact that the corresponding eigenvalues are either negative 
or complex and can not be shown on a logarithmic scale. Clearly negative or complex eigenvalues should 
not occur for symmetric matrices B and C. If we use SVD (command svd in Matlab) to calculate the 
eigenvalues of the singular, symmetric matrices B and C we find only real, nonnegative eigenvalues, as it 
should be. 

Does a singular matrix B frequently occur? A singular matrix B is only likely to occur if the data 
show a high regularity as for example in the synthesized time series of our experiments above. For data 
from natural sources or data with a certain amount of noise a singular B is not likely to happen, at least 
if the length of the time scale is not shorter than the embedding dimension. Even in our experiments with 
the synthesized logistic map, if we move to the chaotic region q < 0.4 then no singular B is observed, not 
for low and not for high embedding dimensions m, and the GEN_EIG algorithm of sf a-tk works as well as 
SVD.SFA. 

Thus for natural data or for data with noise it is very unlikely to see a singular B and this is perhaps 
the reason why the weakness of the GEN_EIG algorithm remained so far unnoticed. 

But in certain circumstances (regular data or small amount of data, as it may occur more frequently in 
classification applications with a limited number of patterns per class) a singular B can nevertheless happen 
and it is good to have with SVD_SFA a numerically stable approach. 

Noise injection Another way of dealing with a singular or rank-deficient B is to add to the original time 
series a certain amount of noise, i.e. to perform a noise injection. If we replace for example 

w(t) <- w{t) + v(t) (21) 

where v(t) is mean-free, normal-distributed noise with standard deviation o v = 10~ 4 , then matrix B has 
full rank for all embedding dimensions m = 4, . . . ,30, thus GENJ5IG will work as well as SVD_SFA. (Of 
course the data are disturbed to a certain extent.) If we try to lower er„ to 10~ 6 , . . . , 10~ 10 then B becomes 
gradually more rank-deficient (from 1% to 50% of the dimension of B) and in parallel the performance of 
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Table 3: Standard deviation E \yf\ of the slowest signal from GEN_EIG as a function of embedding dimen- 
sion m and noise injection amplitude a v . We see that even in a case where the rank- deficiency of B is only 
1 out of 65 dimensions (e. g. for m — 10, a v = 10~ 6 , see Tab. 2), a constraint-violating solution might 
happen. In rare cases a rank- deficient B might produce a correct y\ ( e. g. for m — 8, av = but this 

is more the exception than the rule. 
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GENJEIG degrades in a roughly proportional way (see Tab. 2 and Tab. 3). There might be certain cases 
where GEN_EIG detects the correct slow signal with the correct constraints, but this can not be guaranteed. 

Dependency on e Is the cutoff parameter e in Eq. (28) for the 'close-to-zero '-condition of eigenvalues 
critical for the SVD_SFA algorithm? One might suspect this to be the case since SFA in its slowest signal 
relies on the smallest eigenvalue. But first of all, the eigenvalues in Eq. (28) refer to B while the slowest 
SFA signal has a small eigenvalue in C = E [vv T ~\ . Secondly, a small eigenvalue occuring in natural, noisy 
data will usually be not smaller than 10~ 3 while the 'close-to-zero '-condition is typically in the order of 
10~ 7 . . . 10~ 15 (machine accuracy) . Nevertheles we tested several other values e = 10~ 6 , 10~ 9 , 10~ 12 instead 
of the usual 10 -7 and found virtually the same results. (Only e as large as 10~ 5 resulted in a phase shift of 
the output signal.) Thus we conclude that the dependency on e is not critical over a large range, at least 
not in our experiments. 

5 Conclusion 

We have shown that the standard SFA algorithm GEN_EIG based on generalized eigenvalues and imple- 
mented in the Matlab version of sf a-tk yields under certain circumstances wrong results (in terms of the 
constraints and in terms of the slowness) for the slow SFA signals due to numerical instabilities. Those 
circumstances can be characterized as: "The covariance matrix B of the expanded data is rank-deficient". 

We have presented with SVD_SFA a new algorithmic implementation which follows more closely the SFA 
approach of Wiskott and Sejnowski (WS02) and which is numerically stable for all tested rank-deficient 
matrices B. The MATLAB-implementation is freely available for download. 1 With a certain trick we can 
avoid the direct representation of the sphered expanded data and thus can implement the new SVDJ3FA 
algorithm as efficiently as the original GENJEIG algorithm. The new algorithm has roughly the same 
execution times as the old one since the time-consuming parts (expanding the data and accumulating the 
relevant matrices) remain the same. In our experiments the rank deficiency span a range between 1% and 
90% of the number of dimensions in expanded space (see Tab. 2). The single new parameter added by 
SVD_SFA, namely the eigenvalue cutoff threshold e, was shown to be uncritical: the results obtained were 
insensitive to e-variation over a span of five decades. Thus the new algorithm does not need more parameter 
tuning than the old one, but it is more robust. 

We have shown analytically that both implementations are equivalent as long as matrix B is regular (has 
full rank). 

1 see Appendix B in Sec. 7 for information how to download and use the extended package sfa-tk.V2 
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An alternative approach to reach numerical stability is to avoid rank-deficient matrices by adding a 
certain amount of noise to the original data (noise injection). It has been shown that the original GEN_EIG 
algorithm can be stabilized with the right amount of noise for all embedding dimensions m. Noise injection 
has however the drawback that certain parameters of the noise (noise amplitude, noise distribution and so 
on) have to be carefully tuned anew for each new SFA application. 

In this work the new algorithm SVD_SFA has been applied only to driving force experiments. We plan to 
apply it also to SFA classification applications in the near future, where singular matrices might also occur 
in the case of smaller number of patterns per class. 

Although we have with SVD_SFA now a robust algorithm, a general advice from the results presented 
here is to take always a look at rank(B), the rank of the expanded data's covariance matrix, when performing 
SFA. Sometimes it might be worth to think about the possible reasons for a rank deficiency (e.g. too large 
to or too few data) and to modify the experiment conditions accordingly. 

6 Appendix A 

We review in this appendix some basic properties regarding covariance matrices and sphering matrices. 

Given is a set of K data points v = [vi, vm] T £ 3? M with mean v = E [v]. Here E [•] denotes the 
average over all K points in the set. The covariance matrix associated with v is defined as 

B = Cov(t>) =E[(v- v )(v - v ) T ] = E [vv T ] - v v% (22) 

Usually the covariance matrix deviates from the unit matrix because it has 

1. non-zero off-diagonal elements signaling dependencies between the dimensions of v and 

2. varying diagonal elements showing that different dimensions of v carry different amounts of the total 
variance of the signal. 

Case 1: B = Cov(d) has no zero eigenvalues. Sphering or whitening a set of data points v means to 
search a linear transformation S to obtain 

z = S(v- v ) such that E[z]=0 and Cov(;z) = E [zz T ] = 1 (23) 

If no eigenvalues are zero then the sphering transformation S exists and is given by 

S=D-V*R=( 1/Vri ••• )( ) = ( ri/VTl ) (24) 

where is the fcth eigenvector of Cov(w) with eigenvalue Afc and unit length and where R is the matrix 
containing these eigenvectors in rows. D is the diagonal matrix of all eigenvalues. Usually, D and R are 
obtained by singular value decomposition (SVD). One can easily verify that 



and with this 



RCov(v)R T = D (25) 

E[zz T ] = D- l l 2 RE[{v-v ){v-v Q ) T ] R T D 1 ' 2 

= D- 1 / 2 RCov(v)R T D- 1 / 2 
s * 

=D 

= 1 (26) 
An obviously equivalent way of writing this is 

SGov(v)S T = SBS T = 1 (27) 
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Case 2: B = Cov(d) has eigenvalues Xj = 0. The above sphering approach of course runs into problems 
if there are eigenvalues \j — or close to zero. The corresponding entries in J? -1 / 2 would become infinity or 
very large and would amplify any noise or roundoff-errors multiplied with these entries. The standard trick 
from SVD (PTVF92) to deal with singular matrices B or D is to replace in D -1 / 2 each 1/y/Xj ~ 1/0 with 
(!), thus effectively removing the corresponding eigenvector directions from further analysis. The condition 
for 'close to zero' is defined in relation to the largest eigenvalue 

^■j I ^max — £ 

where we usually set e = 1CP 7 . The matrix S becomes 

s " r v -o)(:;l :) 

i.e. it has a 0-row for each eigenvalue close to zero. S is not invertible, it projects into a subspace of 
dimension P < M where P is the number of non-zero eigenvalues of B (non-zero rows of S) . The covariance 
matrix E [zz T ] = S B S T in Eq. (26) becomes now a diagonal matrix with a 1 for each non-zero eigenvalue 
and a for each eigenvalue close to zero. 

Our investigation above has shown that the results achieved with such a sphering approach are numerically 
stable in contrast to the generalized eigenvalue approach (Bcr03) which has numerical problems with very 
small eigenvalues. 



o 



(28) 



(29) 



7 Appendix B: sfa-tk.V2 

We briefly summarize in this appendix some information on how to obtain and use the extended package 
sfa-tk.V2. 

The package sfa-tk.V2 with the new SVDJ3FA algorithm can be downloaded from http://www.gm. 
fh-koeln.de/~konen/research/projects/SFA/sfa-tk.V2.zip. It is a slightly modified version of Pietro 
Berkes' sfa-tk which is available from http://people.brandeis.edu/~berkes/software/sfa-tk (Bcr03). 
Installation of sfa-tk. V2 is the same as with sfa-tk and all features of sfa-tk are maintained. 

sfa-tk. V2 contains - besides the new SVD_SFA algorithm - also routines for SFA-classification, based 
on the ideas of (Ber05) together with Gaussian classifier routines. Two new demo scripts, namely drivel .m 
and class_demo2 .m illustrate the usage of the new functionalities. A call of the new algorithm looks for 
example like 

[y, hdl] = sfa2(x, 'SVD_SFA') ; 



The main modifications of sfa-tk. V2 are 

• lcov_pca2.m: covariance and sphering matrices are calculated with the more robust SVD method, 
same interface as lcov_pca.m. 

• sfa_step.m: new parameter method, handed over to sfa2_step.m. 

• sfa2_step.m: 

- In step 'sfa': method='GEN_EIG' is the original (Ber03)-code. method='SVD_SFA' is the new 
method along the lines of [WisSej02], using lcov_pca2.m. If opts.dbg>0, then both branches are 
executed and their results are compared with sfa2_dbg.m. 

— In step 'expansion': method='TIMESERIES' is the original (Ber03)-code where the goal is to 
minimize the time difference signal. The method='CLASSIF' is new for classification purposes, 
along the lines of (Bcr05). Each data chunk is assumed to be a set of patterns from the same 
class, and the goal is to minimize the pairwise pattern difference. 

• sfa2.m: new parameters method and pp_type. 
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• sfa2_dbg.m: perform certain debug checks and print out diagnostic information. 

• drivel.m: demo script for performing the driving force demo experiments. 

• class_demo2 .m: demo script for performing classification experiments along the lines of (Ber05). In- 
stead of handwritten digits we use the Vowel benchmark dataset from UCI machine learning repository 
(http : / /archive . ics . uci . edu/ml) 

The full list of modifications is available in file CHANGES.htm in the download package sfa-tk. V2.zip. 
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